CSDE Science Core Tips: The Challenges of Survey Fraud and Tools for Combating Fraud
Posted: 4/19/2021 (CSDE Research)
This week, CSDE Training Director and Research Scientist Christine Leibbrand writes about identifying and preventing fraud in survey research. Her recommendations are based on her work on a multi-city survey in the US of families with young children, and the many conversations she has had with experts in survey research. So how do you prevent survey fraud from occurring and, when it does happen, how do you identify it? Read more here.
Identifying and Preventing Fraud in Survey Research: Lessons from someone who has really been there
By: Christine Leibbrand
Experiencing fraud when you are fielding an online survey is a little like being in a horror movie. Everything seems great at first, you open your survey to participants and watch the participant counts steadily tick up. It almost seems too good to be true that there could be this much interest in your survey! Except soon you realize that it is too good to be true, the number of participants who have taken your survey spikes dramatically, perhaps going from a handful to hundreds in an afternoon and suddenly you are in the nightmare of dealing with a survey that has been taken over by bots and a handful of people that just want to earn some extra money by taking your survey a few (or more) times. This was an experience I personally dealt with when I collaborated with other researchers on a survey of families’ experiences with COVID-19 that focused on families with 5-12-year-old children in 4 cities across the U.S. My recommendations come from that experience and the many conversations I had with experts in survey research.
Identifying and preventing survey fraud is not straightforward. Bots are just as smart as their programmers, which means they are usually really smart. The personal information bots report and their multiple-choice question responses can all make a lot of sense and be difficult to distinguish from the answers of a real person. And people who take your survey multiple times are smart too, they usually know not to use the same email address for each try or replicate their personal information across survey attempts, making it tricky to identify duplicate respondents. So how do you prevent survey fraud from occurring and, when it does happen, how do you identify it?
First, it’s helpful to know the factors that increase your risk for experiencing survey fraud. The biggest thing that increases your risk for fraud is paying participants. This is, of course, not to say that you should never pay participants. However, it is the rare (and rather malevolent) fraudster who is going to put in the time to design a bot to take over your survey without some external benefit. The more you pay participants and the faster that payment is directed to participants after they take a survey, the more incentive individuals and their bots will have to take the survey repeatedly. A lottery system isn’t perfect either. If you offer a relatively large prize (say more than $100), fraudsters will have an incentive to flood your survey to virtually ensure their chances of winning. A smaller lottery prize, however, is likely to be less incentivizing. While paying participants may be necessary and potentially the most ethical thing to do if your survey is moderately time-consuming, you may think about other incentives to increase participant buy-in or at least anticipate the ways that paying participants may increase your risk for fraud and plan accordingly.
Another huge risk factor for experiencing fraud is advertising your survey publicly, particularly on large social media sites. Advertising your survey on social media sites such as Twitter and Facebook can be great because it increases your reach to potential participants. It is also very risky because it increases your reach to potential fraudsters. As an (imperfect) compromise, you might consider reaching out to the owners of private Facebook groups that are relevant to your survey population and ask them to advertise the survey amongst their members. You might also work with community groups and/or organizations that provide sampling frames for a fee. Using these resources and advertising platforms does not eliminate your risk for fraud by any means (and, indeed, our study experienced bot-related fraud after it was distributed to a single community group), but they can reduce your risk.
Publicly advertising your survey also often necessitates using a public survey link that anyone can click on from an advertisement or flyer. By allowing anyone to click on the link, public survey links open you up to the risk that the same person will be able to click on that link many times. Some survey platforms allow you to limit public links such that they can only be clicked on once by someone with a given IP address, though it is quite easy for fraudsters to obscure or change their IP addresses and access the survey multiple times anyway.
Finally, there are some study designs that make it unduly burdensome for many people to commit fraud. For example, if the study involves one-on-one follow-ups via video chat or phone call before a payment is sent out, a fraudster is unlikely to be able to commit to 50 phone calls or video chats and it would hopefully become quite apparent to a researcher if they tried to do so. Implementing one-on-one follow-ups is a more burdensome strategy for researchers, however, and should be implemented on the basis of its usefulness for the research questions. Consequently, to prevent fraud from occurring in the first place, you might think critically about participant compensation, how you’ll be advertising your survey, and whether you might creatively interact with participants or verify identities.
Now you’ve thought through all that, but you still need to pay participants, advertise your survey publicly, and you won’t be able to do one-on-one follow-ups, how do you prevent fraud in your survey design and identify fraud once it (almost) inevitably happens? First, it is important to note that if you intend to not pay fraudulent participants, you will need to make sure your methods for identifying and preventing fraud are cleared by the IRB and you will generally need IRB-approved language in your consent forms specifying that individuals will not be paid if you observe signs of fraud.
You can start by preventing and identifying fraud in your screener by not specifying your screening or eligibility criteria. If you do so, it is very easy for fraudsters to figure out how to make themselves eligible for your survey. Moreover, even well-meaning people can fudge their screening responses a bit with the hopes of being eligible for a paid survey. Recall that in the study I collaborated on, we were surveying families with 5-12-year-old children. In the screener for our survey, we asked 3 separate questions about whether participants had 1) 0-4-year-old children, 2) 5-12-year-old children, and 3) 13-18-year-old children. When we asked where participants were located, we included cities where we were not collecting data or advertising the survey and we asked for respondents’ zip codes. Thus, getting into the survey took extra effort for those who did not know what child age ranges or sites we were looking for. These steps screened out 46.5% of the 13,561 participants who entered our survey.
Next you might consider implementing consistency checks where you ask participants about important information multiple times throughout the survey. For example, in our study, we asked respondents if they had a 5-12-year-old child in the screener, we then asked for the birth year of the child at the beginning of the survey, we asked for the child’s school grade, and at the end of the survey (which generally took about 30 minutes) we asked for the child’s age. Those who reported child ages and birth years that were inconsistent with one another by more than 2 years were flagged, as were those who reported a child birth year that was over 2 years away from the typical age range for that grade. For people who are pretending to be someone they are not and for bots that are not as carefully coded, these consistency checks can be helpful and, for us, flagged about 8% of the participants who made it past the screener and completed the survey as likely fraudulent.
Effort checks, which flag those who did not seem to put “good faith” effort into the survey can also be very helpful signs of fraud. For example, attention check questions which ask participants whether they did something very unrealistic recently (such as “swam with sharks in the past week” or “won more than a million dollars in the lottery”) can catch people/bots who are not paying attention to the questions they are answering. Do make sure that these questions are very unrealistic, however. You might also consider screening out people who answer very few of your questions (say 2/3 or less of the survey questions), while still specifying that participants may skip any particular question they would like to screen out those who quickly page through the survey hoping to get paid for just their email address.
As noted above, fraudsters will rarely put duplicate email addresses or phone numbers across survey attempts. However, some survey platforms allow you to collect IP addresses and/or latitude and longitude. Screening out duplicates based on this information tends to be more successful, though keep in mind that IP addresses can be obscured or changed, and many fraudsters will know to do this. Collecting this information also has important ethical considerations for identifiability and privacy that should be considered.
The final category of checks I’ll discuss are automation checks that specifically help identify bots. These are the most effective checks I have personally used for identifying fraud and work best when your survey is long enough where you would expect people to observe a decent amount of variation in the time it takes people to complete the survey. They also work best when your survey has multiple survey modules (for example, a module on personal demographics, a module on income, a household roster module, etc.) For these checks, you look for patterns in the amount of time it took a respondent to complete the survey or portions of it. For example, observing multiple participants that started and stopped the survey at the same time on the same day (say within one minute of one another) is very sketchy, especially when participants tend to take between 25-40 minutes to take your survey (as was the case for us). We also observed many cases where multiple respondents would spend the exact same amount of time (within milliseconds) on multiple survey modules across the survey. Screening out individuals who spend an unrealistically short amount of time on your survey is another good way to catch bots as well as those who are taking the survey without paying attention to the questions in order to get paid. Some software also have capabilities for asking “secret” or “hidden” questions that are hidden to regular participants, but that bots will respond to.
In our study, we found that of the 6,205 participants who made it past the screener and completed the survey, 27.37% (n = 1,698 participants) spent an unrealistically short amount of time on the survey, 38.41% (n = 2,383) started and ended the survey within 1 minute of another respondent on the same day, and 18.7% (n = 1158) of respondents spent the same amount of time on all or almost all of the survey modules as another respondent. We later learned that 47 of our observations were definitely fraudulent and came from the same person who had likely created a bot. 44 of those observations were flagged as fraudulent because they spent an unrealistically short amount of time on the survey, 34 started and stopped the survey within one minute of another “respondent” on the same day, and 43 spent the same amount of time on all or almost all of the modules as another “respondent.” Our other fraud criteria (duplicate IP addresses, duplicate email addresses, consistency checks, screening criteria, etc.) did not catch these cases of fraud.
These are just a few of the many potential strategies you might use to prevent and identify survey fraud. More are discussed in my workshop on survey fraud here. However, if there is one thing to take away from this, it’s that survey fraud is not rare. Experiencing it can feel embarrassing and massively disappointing, which means it is something that is rarely talked about despite how often it is occurring. The best thing you can do is to spend dedicated time reflecting on potential risks and strategizing ways to identify it that are appropriate to your study design and research questions before going out into the field. And, as always, CSDE is here to offer support and consultation for questions you might have.